A significant part of the largest Knowledge Graph today, the Linked Open Datacloud, consists of metadata about documents such as publications, news reports,and other media articles. While the widespread access to the document metadatais a tremendous advancement, it is yet not so easy to assign semanticannotations and organize the documents along semantic concepts. Providingsemantic annotations like concepts in SKOS thesauri is a classical researchtopic, but typically it is conducted on the full-text of the documents. For thefirst time, we offer a systematic comparison of classification approaches toinvestigate how far semantic annotations can be conducted using just themetadata of the documents such as titles published as labels on the Linked OpenData cloud. We compare the classifications obtained from analyzing thedocuments' titles with semantic annotations obtained from analyzing thefull-text. Apart from the prominent text classification baselines kNN and SVM,we also compare recent techniques of Learning to Rank and neural networks andrevisit the traditional methods logistic regression, Rocchio, and Naive Bayes.The results show that across three of our four datasets, the performance of theclassifications using only titles reaches over 90% of the quality compared tothe classification performance when using the full-text. Thus, conductingdocument classification by just using the titles is a reasonable approach forautomated semantic annotation and opens up new possibilities for enrichingKnowledge Graphs.
展开▼